Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?
نویسندگان
چکیده
Gaussian Mixture Model (GMM) and Multi Layer Perceptron (MLP) based acoustic models are compared on a French large vocabulary continuous speech recognition (LVCSR) task. In addition to optimizing the output layer size of the MLP, the effect of the deep neural network structure is also investigated. Moreover, using different linear transformations (time derivatives, LDA, CMLLR) on conventional MFCC, the study is also extended to MLP based probabilistic and bottle-neck TANDEM features. Results show that using either the hybrid or bottleneck TANDEM approach leads to similar recognition performance. However, the best performance is achieved when deep MLP acoustic models are trained on concatenated cepstral and context-dependent bottle-neck features. Further experiments reveal the importance of the neighbouring frames in case of MLP based modeling, and that its gain over GMM acoustic models is strongly reduced by more complex features.
منابع مشابه
Cross-lingual and multi-stream posterior features for low resource LVCSR systems
We investigate approaches for large vocabulary continuous speech recognition (LVCSR) system for new languages or new domains using limited amounts of transcribed training data. In these low resource conditions, the performance of conventional LVCSR systems degrade significantly. We propose to train low resource LVCSR system with additional sources of information like annotated data from other l...
متن کاملContext-Dependent Multiple Distribution Phonetic Modeling with MLPs
Berkeley, CA 94704 Victor Abrash SRI International A number of hybrid multilayer perceptron (MLP)/hidden Markov model (HMM:) speech recognition systems have been developed in recent years (Morgan and Bourlard. 1990). In this paper. we present a new MLP architecture and training algorithm which allows the modeling of context-dependent phonetic classes in a hybrid MLP/HMM: framework. The new trai...
متن کاملHermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models
This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed-forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or...
متن کاملNeural network acoustic models for the DARPA RATS program
We present a comparison of acoustic modeling techniques for the DARPA RATS program in the context of spoken term detection (STD) on speech data with severe channel distortions. Our main findings are that both Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) outperform Gaussian Mixture Models (GMMs) on a very difficult LVCSR task. We discuss pre-training, feature sets and ...
متن کاملHybrid neural network/hidden Markov model continuous-speech recognition
n M In this paper we present a hybrid multilayer perceptron (MLP)/hidde arkov model (HMM) speaker-independent continuous-speech recognib tion system, in which the advantages of both approaches are combined y using MLPs to estimate the state-dependent observation probabilities p of an HMM. New MLP architectures and training procedures are resented which allow the modeling of multiple distributio...
متن کامل